193 research outputs found
Few Shot Network Compression via Cross Distillation
Model compression has been widely adopted to obtain light-weighted deep
neural networks. Most prevalent methods, however, require fine-tuning with
sufficient training data to ensure accuracy, which could be challenged by
privacy and security issues. As a compromise between privacy and performance,
in this paper we investigate few shot network compression: given few samples
per class, how can we effectively compress the network with negligible
performance drop? The core challenge of few shot network compression lies in
high estimation errors from the original network during inference, since the
compressed network can easily over-fits on the few training instances. The
estimation errors could propagate and accumulate layer-wisely and finally
deteriorate the network output. To address the problem, we propose cross
distillation, a novel layer-wise knowledge distillation approach. By
interweaving hidden layers of teacher and student network, layer-wisely
accumulated estimation errors can be effectively reduced.The proposed method
offers a general framework compatible with prevalent network compression
techniques such as pruning. Extensive experiments on benchmark datasets
demonstrate that cross distillation can significantly improve the student
network's accuracy when only a few training instances are available.Comment: AAAI 202
The Fast and the Private: Task-based Dataset Search
Modern dataset search platforms employ ML task-based utility metrics instead
of relying on metadata-based keywords to comb through extensive dataset
repositories. In this setup, requesters provide an initial dataset, and the
platform identifies complementary datasets to augment (join or union) the
requester's dataset such that the ML model (e.g., linear regression)
performance is improved most. Although effective, current task-based data
searches are stymied by (1) high latency which deters users, (2) privacy
concerns for regulatory standards, and (3) low data quality which provides low
utility. We introduce Mileena, a fast, private, and high-quality task-based
dataset search platform. At its heart, Mileena is built on pre-computed
semi-ring sketches for efficient ML training and evaluation. Based on
semi-ring, we develop a novel Factorized Privacy Mechanism that makes the
search differentially private and scales to arbitrary corpus sizes and numbers
of requests without major quality degradation. We also demonstrate the early
promise in using LLM-based agents for automatic data transformation and
applying semi-rings to support causal discovery and treatment effect
estimation
Multi-scale Attention Flow for Probabilistic Time Series Forecasting
The probability prediction of multivariate time series is a notoriously
challenging but practical task. On the one hand, the challenge is how to
effectively capture the cross-series correlations between interacting time
series, to achieve accurate distribution modeling. On the other hand, we should
consider how to capture the contextual information within time series more
accurately to model multivariate temporal dynamics of time series. In this
work, we proposed a novel non-autoregressive deep learning model, called
Multi-scale Attention Normalizing Flow(MANF), where we integrate multi-scale
attention and relative position information and the multivariate data
distribution is represented by the conditioned normalizing flow. Additionally,
compared with autoregressive modeling methods, our model avoids the influence
of cumulative error and does not increase the time complexity. Extensive
experiments demonstrate that our model achieves state-of-the-art performance on
many popular multivariate datasets
Modeling the Resource Requirements of Convolutional Neural Networks on Mobile Devices
Convolutional Neural Networks (CNNs) have revolutionized the research in
computer vision, due to their ability to capture complex patterns, resulting in
high inference accuracies. However, the increasingly complex nature of these
neural networks means that they are particularly suited for server computers
with powerful GPUs. We envision that deep learning applications will be
eventually and widely deployed on mobile devices, e.g., smartphones,
self-driving cars, and drones. Therefore, in this paper, we aim to understand
the resource requirements (time, memory) of CNNs on mobile devices. First, by
deploying several popular CNNs on mobile CPUs and GPUs, we measure and analyze
the performance and resource usage for every layer of the CNNs. Our findings
point out the potential ways of optimizing the performance on mobile devices.
Second, we model the resource requirements of the different CNN computations.
Finally, based on the measurement, pro ling, and modeling, we build and
evaluate our modeling tool, Augur, which takes a CNN configuration (descriptor)
as the input and estimates the compute time and resource usage of the CNN, to
give insights about whether and how e ciently a CNN can be run on a given
mobile platform. In doing so Augur tackles several challenges: (i) how to
overcome pro ling and measurement overhead; (ii) how to capture the variance in
different mobile platforms with different processors, memory, and cache sizes;
and (iii) how to account for the variance in the number, type and size of
layers of the different CNN configurations
SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling
Time series analysis is widely used in extensive areas. Recently, to reduce
labeling expenses and benefit various tasks, self-supervised pre-training has
attracted immense interest. One mainstream paradigm is masked modeling, which
successfully pre-trains deep models by learning to reconstruct the masked
content based on the unmasked part. However, since the semantic information of
time series is mainly contained in temporal variations, the standard way of
randomly masking a portion of time points will seriously ruin vital temporal
variations of time series, making the reconstruction task too difficult to
guide representation learning. We thus present SimMTM, a Simple pre-training
framework for Masked Time-series Modeling. By relating masked modeling to
manifold learning, SimMTM proposes to recover masked time points by the
weighted aggregation of multiple neighbors outside the manifold, which eases
the reconstruction task by assembling ruined but complementary temporal
variations from multiple masked series. SimMTM further learns to uncover the
local structure of the manifold, which is helpful for masked modeling.
Experimentally, SimMTM achieves state-of-the-art fine-tuning performance
compared to the most advanced time series pre-training methods in two canonical
time series analysis tasks: forecasting and classification, covering both in-
and cross-domain settings
Efficient Test-Time Model Adaptation without Forgetting
Test-time adaptation (TTA) seeks to tackle potential distribution shifts
between training and testing data by adapting a given model w.r.t. any testing
sample. This task is particularly important for deep models when the test
environment changes frequently. Although some recent attempts have been made to
handle this task, we still face two practical challenges: 1) existing methods
have to perform backward computation for each test sample, resulting in
unbearable prediction cost to many applications; 2) while existing TTA
solutions can significantly improve the test performance on out-of-distribution
data, they often suffer from severe performance degradation on in-distribution
data after TTA (known as catastrophic forgetting). In this paper, we point out
that not all the test samples contribute equally to model adaptation, and
high-entropy ones may lead to noisy gradients that could disrupt the model.
Motivated by this, we propose an active sample selection criterion to identify
reliable and non-redundant samples, on which the model is updated to minimize
the entropy loss for test-time adaptation. Furthermore, to alleviate the
forgetting issue, we introduce a Fisher regularizer to constrain important
model parameters from drastic changes, where the Fisher importance is estimated
from test samples with generated pseudo labels. Extensive experiments on
CIFAR-10-C, ImageNet-C, and ImageNet-R verify the effectiveness of our proposed
method.Comment: 15 pages, conferenc
Privacy-Preserving Face Recognition Using Random Frequency Components
The ubiquitous use of face recognition has sparked increasing privacy
concerns, as unauthorized access to sensitive face images could compromise the
information of individuals. This paper presents an in-depth study of the
privacy protection of face images' visual information and against recovery.
Drawing on the perceptual disparity between humans and models, we propose to
conceal visual information by pruning human-perceivable low-frequency
components. For impeding recovery, we first elucidate the seeming paradox
between reducing model-exploitable information and retaining high recognition
accuracy. Based on recent theoretical insights and our observation on model
attention, we propose a solution to the dilemma, by advocating for the training
and inference of recognition models on randomly selected frequency components.
We distill our findings into a novel privacy-preserving face recognition
method, PartialFace. Extensive experiments demonstrate that PartialFace
effectively balances privacy protection goals and recognition accuracy. Code is
available at: https://github.com/Tencent/TFace.Comment: ICCV 202
- …